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Abstract 

We  investigate  absolute  bounds  (or  inequalities)  on  the  mean  and  standard  deviation  of  transformed  data 
values,  given  only  a  few  statistics  on  the  original  set  of  data  values.  Our  work  applies  primarily  to 
transformation  functions  whose  derivatives  are  constant-sign  for  a  positive  range  (e.g.  logarithm,  antilog, 
square  root,  and  reciprocal).  With  such  functions  we  can  often  get  reasonably  tight  absolute  bounds,  so  that 
distributional  assumptions  about  the  data  needed  for  confidence  intervals  can  be  eliminated.  We  investigate  a 
variety  of  methods  for  obtaining  such  bounds,  first  examining  bounding  curves  which  are  straight  lines,  then 
those  that  are  quadratic  polynomials.  While  the  problem  of  finding  the  best  quadratic  bound  is  an 
optimization  problem  with  no  closed-form  solution,  we  display  a  variety  of  closed-form  quadratic  bounds 
which  can  come  close  to  the  optimal  solution.  We  emphasize  what  can  be  done  with  prior  knowledge  of  the 
mean  and  standard  deviation  of  the  untransformed  data  values,  but  do  address  some  other  statistics  too. 
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abbreviated  title:  Absolute  Bounds  on  Statistics  of  Transformed  Values 


1 .  Introduction 

Standard  transformations  of  numeric  data  values  such  as  logarithm,  antilog,  square  root,  square,  cube,  and 
reciprocal  are  frequently  appropriate  as  a  prelude  to  statistical  analysis  of  finite  data  sets  [7].  Sometimes, 
however,  the  data  are  already  aggregated  into  counts  and  means,  and  the  original  data  values  lost.  This 
happens  when  the  original  data  is  too  large  to  handle  and/or  contains  sensitive  information,  as  the 
U.  S.v  Census,  which  publishes  much  of  its  data  as  aggregates.  We  may  also  deliberately  create  "database 
abstracts"  of  aggregate  statistics  to  facilitate  quick  statistical  estimates  by  "antisampling"  methods  [10]. 
Statistics  on  the  transformed  values  cannot  be  calculated  uniquely  when  die  original  data  is  so  preaggrcgated  . 
But  if  we  are  doing  exploratory  data  analysis  [13,  6],  an  estimate  of  a  statistic  on  the  transformed  data  may  be 
all  that  we  need.  We  address  one  set  of  methods  for  obtaining  such  estimates,  by  finding  absolute 
(unconditionally  guaranteed)  bounds  on  the  mean  and  standard  deviation  for  data  under  some  common 
transformations. 

Absolute  bounds  arc  the  only  true  "nonparamctric"  form  of  estimate,  and  as  such  have  advantages. 
Compared  to  "reasonable-guess"  estimates  [9],  biasedness  of  the  estimator  need  not  be  dealt  with,  while  at  die 
same  time  providing  numbers  close  to  die  true  answer  for  this  category  of  problems.  As  [7]  discusses, 
confidence  intervals  for  the  mean  and  standard  deviation  of  transformed  data  arc  difficult  to  obtain  and 
methods  are  subject  to  exceptions,  and  thus  absolute  bounds  easily  obtained  are  appealing.  Tight  enough 
absolute  bounds  can  be  equivalent  to  a  good  estimate.  An  estimate  of  a  statistic  can  also  be  logically  incorrect 
when  bounds  are  tight,  i.e.  it  may  not  be  a  statistic  of  any  possible  distribution  consistent  widi  the  constraints. 
Bounds  are  useful  for  other  reasons  as  well.  Some  algorithms  exploit  only  bounds,  as  die  "branch  and 
bound"  methods  of  [4]  for  retrieval  of  information  from  a  database.  Other  advantages  wc  have  investigated  in 
previous  work  [10,  11,  12].  In  addition,  the  madicmatics  of  absolute  bounds  is  straightforward  and  requires 
only  elementary  calculus. 

Our  approach  is  to  give  a  variety  of  bounds  formulae  for  the  same  estimation  situation.  In  general,  wc  do 
not  know  which  of  several  bounding  methods  will  be  the  best  for  a  problem,  and  this  suggests  the  program 
architecture  of  an  artificial-intelligence  "production  system"  [1].  We  can  combine  results  by  taking  die 
minimum  of  all  the  upper  bounds,  and  the  maximum  of  all  die  lower  bounds. 


L-vcn  iflhc  data  is  transformed  before  being  aggregated,  there  arc  still  many  reasons  to  want  statistics  on  die  untransformed  data.  To 
use  the  example  of  [7].  it  is  useful  to  study  rainfall  in  the  cube  root  of  inches,  'out  one  may  then  be  interested  in  statistics  on  the  cube  of 
that,  the  meaningful  quantity  of  total  volume. 


2.  Our  approach 

In  this  work  we  examine  transformation  functions  whose  derivatives  have  a  constant  sign  in  the  interval  of 
study.  (We  may  be  able  to  relax  this  restriction  in  particular  cases,  however;  usually  only  a  constant-sign 
second  derivative  is  necessary.  Chapter  3  of  [5]  discusses  detailed  restrictions,  in  particular  the  notion  of 
function  convexity,  for  the  material  we  cover  in  section  3  below.)  The  so-called  "power  transformations"  and 
their  inverses  [2]  satisfy  this  constant-sign  restriction  for  positive  data  values.  Six  common  power 
transformations  are  log,  antilog,  square  root,  square,  cube,  and  reciprocal,  and  these  will  be  our  primary 
examples.  Logarithm  is  particularly  important  because  the  mean  of  the  logs  is  the  log  of  the  geometric  mean 
of  a  set  of  data  values;  reciprocal  is  also  important  because  it  provides  the  key  to  handling  quotients  of 
random  variables.  To  summarize  the  six  example  transformations: 

Function  first  deriv.  second  deriv.  steepest  point 

ln(x)  +  -  leftside 

ex  +  +  right  side 

-/x  +  -  left  side 

x2  +  +  right  side 

x3  +  +  right  side 

1/x  -  +  leftside 

We  shall  assume  the  following  statistics  on  the  original  (untransformed)  data  values  are  known: 

•  ii,  the  mean  of  the  values  (or  equivalcntly,  the  sum  of  the  values  and  the  number  of  values) 

•  m,  the  minimum  of  the  values 

•  M,  the  maximum  of  the  values 

Even  when  we  do  not  know  the  minimum  and  maximum  exactly,  we  can  often  assume  extreme  "safe"  values 
which  the  minimum  cannot  be  less  than  and  the  maximum  cannot  be  greater  than,  and  which  we  can  use  in 
our  formulae.  So  it  is  reasonable  to  believe  we  can  always  come  up  with  a  minimum  and  maximum  for  a  set 
of  values. 

In  much  of  what  follows  we  also  assume  the  following  is  known: 

•  a,  the  standard  deviation  of  the  values  -  defined  as  21<i<n(x.-xt)  /n,  instead  of  the  more 
conventional  formula  with  a  denominator  of  n-1 

Note  we  use  the  symbols  /x  and  a  to  emphasize  that  we  arc  consider  finite  data  populations,  which  are  not 
necessarily  samples  of  anything. 

We  shall  ignore  linear  transformations  of  variables  as  a  preliminary  to  applying  power  functions,  since 


these  can  be  handled  trivially.    For  instance,  fl[x)  =  ln(ax  +  b)  can  be  analyzed  by  defining  y  =  ax  +  b  and 

analyzing  g(y)  =  ln(y),  where  p.  =au  +  band  a  =  aa  . 

.  y  x  y  x 

Our  basic  idea  is  to  find  functions  that  arc  (a)  entirely  above,  and  (b)  entirely  below  the  curve  of  the 
function  on  the  data-value  interval.  We  shall  consider  two  important  cases:  bounding  curves  that  are  straight 
lines  (sections  3  and  4)  and  bounding  curves  that  arc  second-degree  polynomials  (quadratics)  (sections  5,  6,  7, 
and  8).  Subsequent  sections  consider  extensions  to  this  framework:  use  of  subset  means  and  standard 
deviations  in. section  9,  use  of  order  statistics  in  section  10,  use  of  distribution  fits  in  11,  and  adjustments  for 
small  populations  in  12.  We  conclude  with  some  simple  test  experiments  in  section  13. 

3.  Linear  bounds  on  the  mean 

3.1 .  Overview 

For  straight  lines,  one  curve  can  be  a  tangent  to  the  curve  at  some  point  (for  convenience,  the  mean);  the 

other  a  secant  of  the  curve  through  it  at  the  minimum  and  the  maximum.   For  curves  with  negative  second 

derivative  like  logarithm  and  square  root,  the  tangent  is  an  upper  bound,  the  secant  a  lower  bound;  for  curves 

with  positive  second  derivative  like  antilog  and  reciprocal,  the  tangent  is  the  lower  bound  and  the  secant  the 

upper.  These  bounding  lines  map  directly  into  bounds  on  the  mean  and  standard  deviation,  for  note  if  ax  +  b 

>  f(x)  for  all  x  in  a  range,  f  some  transformation  functions  satisfying  our  restrictions,  and  E  denoting 

expected  value,  then 

E(ax  +  b)>  B(f(x)),  or 
aE(x)  +  b  >  E(f(x)),  or 
aju  +  b  >  F(f(x)) 

F(f(x))  being  the  quantity  we  are  interested  in  bounding. 

3.2.  Linear  bounds  on  the  mean 

Let  us  apply  Uiese  ideas  to  the  mean  of  transformed  values  (see  figure  3-1).  The  tangent  to  f(x)  at  ju,  has 
equation 

y  =  x*fO*)  +  [fOO-M*f<M)] 

This  leads  to  a  well-known  bound  (generalized  in  [5],  p.  70): 

ju*f(/x)  +  [f0i)-M*f0O]  =  f(/O 

On  the  other  side  of  the  curve,  the  secant  Uirough  the  maximum  and  minimum  forms  a  bound.  This  line  has 
equation 

y  =  x  *  [(f(M)-r(m))/(M-m)l  +  [f(m)  -  m  *  [(f(M)-f(m))/(M-m)]J 
which  corresponds  to  the  bound 


Figure  3-1:    Linear  bounds  on  the  mean  of  transformed  values 


/i  *  [(f(MH(m))/(M-m)]  +  [f(m)  -  in  *  [(fl(M)-fl(m))/(M-m)l] 

=  (|u-m)*[(f(M)-f(m))/(M-m)]  +  f(m) 

=  (l-a)flm)  +  afi(M)  =  f(m)  +  a(fl[M)-f(m)) 
where  a  =  (/i-m)/(M-m) 

To  give  an  example,  if  a  set  of  data  values  ranges  from  10  to  100,  and  the  mean  is  23,  the  mean  of  the 

logarithms  of  the  data  values  has 

an  upper  bound  of  ln(23)  =  3.135 

a  lower  bound  of  77/90  ln(10)  +  13/90  ln(100)  =  2.635 

Hence  the  geometric  mean  of  the  original  data  values  is  between  e  =13.9  and  e  -23.  In  general 
from  these  formulae,  the  geometric  mean  is  between  n  and  m(M/m)a;  and  the  harmonic  mean  is  between  jh 
and  l/[l/m  +  1/M  -  ju/mM]. 

3.3.  Proof  that  tangent  at  the  mean  is  optimal 

Note  that  the  bound  obtained  from  taking  the  tangent  at  ji  is  optimal  for  the  conditions  we  are  assuming  on 
f.  To  sec  this,  suppose  we  use  the  tangent  at  some  other  point  t,  i.e.  the  line  y  =  f(t)  +  (x-t)f  (t).  Then  the 
mean  on  this  bound  line  is 

EKt)  +  (xrt)f(t)]  -  f(t)  +  Oi-t)f(t) 

Now  we  want  to  find  the  maximum  of  this  as  t  varies,  so  we  take  the  derivative  with  respect  to  t  and  set  it 
equal  to  zero: 

f(t)  -  f  (t)  +  Oi-t)f'(t)  =  o  =  (/i-t)f'(t) 

But  since  we  assumed  that  f  had  a  constant-sign  second  derivative  in  the  interval  of  interest,  the  only  way  this 
can  be  zero  is  if  ju,  =  t.  Hence  the  only  extreme  value  for  the  bound  will  be  when  we  take  a  tangent  at  /x  --  a 
minimum  for  downwards-curving  functions,  and  a  maximum  for  upwards-curving. 

3.4.  Miscellaneous  comments 

In  the  case  of  a  negative  second  derivative,  the  tangent  bound  is  an  upper  bound,  and  die  secant  bound  a 

lower  bound;  otherwise,  the  reverse.  Note  the  two  bounds  arc  related,  because  they  can  be  rewritten  as 

f((l-a)m  +  aM)and 

(l-a)f(m)  +  al(M),  where  a  =  (/.t-m)/(M-m) 

so  diey  represent  interchanging  of  a  weighting  and  functional  application. 
Here  is  a  table  of  the  linear  bounds  for  our  six  common  transformations: 


Function  Upper  mean  bound  Lower  mean  bound 


natural  log 

ln(/x) 

(l-a)ln(m)  -1-  aln( 

antilog 

(l-a)em  +  aeM 

e^ 

square  root 

A 

(l-a)Vm  +  aVM 

square 

(l-a)m2  4-  aM2 

,2 

cube 

(l-a)m3  +  aM3 

S 

reciprocal 

a/m  +  (l-a)/M 

l//i 

where  a  =  [/i-m]/[M-m] 

3.5.  Accuracy  of  linear  mean  bounds 

To  illustrate  effectiveness  of  the  bounds,  we  tabulate  the  bounds  for  in  =  10,  M  =  100,  f=  In,  and  for 
[t  =  19,28,37,46,55,64,73,82,  and  91.  The  "bounds  range  fraction"  is  the  ratio  of  the  distance  between  the 
bounds  to  the  total  range  of  the  function  on  the  values,  the  difference  between  f(M)  and  f(m);  it  indicates  the 
quality  of  the  estimate. 

bounds  range  fraction 


mean  <jtx) 

upper  bound 

lower  bound 

boui 

19 

2.944 

2.533 

.179 

28 

3.332 

2.763 

.247 

37 

3.611 

2.993 

.268 

46 

3.829 

3.224 

.263 

55 

4.007 

3.454 

.240 

64 

4.159 

3.684 

.206 

73 

4.290 

3.914 

.163 

82 

4.407 

4.145 

.114 

91 

4.511 

4.374 

.059 

It  is  typical  that  the  estimates  are  best  for  extreme  /x,  and  the  error  is  worst  for  a  particular  value  inside  the 

range.   We  can  calculate  tiiis  value.   Assume  f  has  negative  second  derivative  (the  other  case  is  analogous). 

Then  we  want  to  find  the  maximum  of  the  function  representing  the  difference  of  die  tangent  and.  secant 

bounds,  or 

g(x)  =  f(|Li)  -  (l-a)f(m)  -  af(M),  where  a  =  (jti-m)/(M-m) 

We  find  tiiis  by  setting  to  zero  the  derivative  with  respect  to  ju,  in  other  words 

dg(ju)/dx  =  0  =  df(u)/dx  +  f(m)/(M-m)-  f(M)/(M-m) 
f(/i)  =  (f(M)-f(in))/(M-m) 

Or  in  other  words,  the  maximum  error  occurs  for  any  function  f  (that  satisfies  our  conditions)  for  a  mean  at 

die  point  where  die  tangent  to  f  is  parallel  to  the  secant  through  die  endpoints.  This  makes  sense  because  Uiis 
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is  the  point  at  which  h»  stops  "turning  away"  from  the  secant  and  begins  turning  back  towards  it.  Note  by 
Rollc's  Theorem  there  is  always  one  such  point  where  the  lines  arc  parallel,  and  the  constant  sign  of  the 
second  derivative  ensures  diat  there  is  never  more  than  one  such  point. 

For  specific  f  we  can  tabulate  the  point  of  maximum  error  from  this  formula,  as  a  function  of  m  and  M. 
Function  Worst  ju, 

natural  log  (In  x)  (M-m)/ln(M/m) 

antilog(ex)  ln[(cM-em)/(M-m)] 

square  root  (M-m)2/4f7M  -  /m] 

square  (M+m)/2 

cube  V[(m2  +  mM  +  M2)/3] 

reciprocal  v'(mM) 

The  maximum  error  may  then  be  obtained  as  |f(/i.)  +  f(M)  -  [(jli        -m)(l(M)-fl[m))/(M-m)]|. 

3.6.  Bounds  on  the  standard  deviation,  given  mean 

A  simple  application  of  the  linear  bounds  on  the  mean  of  transformed  values  is  to  bounding  the  standard 

deviation  of  a  set  of  values  given  only  die  maximum  (M),  minimum  (m),  and  mean  (jn).   The  variance  is 

computed: 

2(x-/i)2/n  =  2x2/n  -  /m2 

But  since  square  is  a  continuous  function  with  a  constant-sign  second  derivative,  we  can  bound  the  second 

summation,  and  hence  the  bounds  on  the  variance  are: 

lower  bound:  /a  -  /x    -  0 

upper  bond:  m2  +  (ju-m)(M2-m2)/(M-m)  -  jn2  =  juM  +  jum  -  mM  -  \x2  =  (/x-m)(M-ju.) 

And  so  the  bounds  on  the  standard  deviation  are: 

lower  bound:  0 

upper  bond:  V'[(/x-m)(M-ju.)] 

We  will  use  this  result  frequently. 

4.  Linear  bounds  on  the  standard  deviation 

There  arc  two  methods  we  can  use  to  bound  the  standard  deviation  of  a  set  of  transformed  values.  First,  we 
can  use  the  two  bounds  lines  used  previously,  bound  the  sum  of  the  squares,  and  subtract  out  die  effect  of  the 
mean  (i.e.  use  the  formula  2x  /n  -  [2x/n]).  Second,  we  can  construct  two  new  lines  passing  through  f(x)  at 
die  mean  of  the  transformed  values. 


4.1.  Sum-of-squares  bounds 

Bound  line  y  =  ax  +  b  has  second  moment  (sum  of  squares)  equal  to 

F[(ax  +  b)2]  =  H[a2x2  +  2abx  +  b2]  =  a2(a2  +  ii2)  +  2ab/i  +  b2  =  (a/i  +  b)2  +  aV 

For  our  two  bounds  lines: 

tangent:  a  =  f  (/i),  b  =  [fi»  -  ju  *  f  (ju)] 

secant:  a  =  (f(M)-n»)/(M-m),  b  =  [f(m)  -  m  *  [(f(M)-f(m))/(M-m)]] 

hence  the  tangent  bound  on  the  sum  of  the  squares  is 

(a2  +  /i2)[f(/x)2j  +  2/*[f (/i)][fl[/i)  - p  *  f»]  +  [fi»  -  ii  *  ?(li)]2 

=  <x2[f  (it)]2  +  [fi»]2 
and  the  secant  bound  is 

/?2(a2  +  /x2)  +  2juyS[f(m)  -  m)8]  +  [f(m)  -  mp]2 
where^  =  [f(M)-n»]/[M-m] 

To  find  bounds  on  the  variance,  then,  we  subtract  the  larger  of  these  two  bounds  from  the  square  of  the 
lower  bound  on  the  mean  to  get  the  upper  bound;  and  subtract  the  smaller  of  these  two  bounds  from  the 
square  of  the  upper  bound  on  the  mean  to  get  the  lower  bound.  The  standard  deviation  then  has  upper 
bound  the  square  root  of  the  variance  upper  bound,  and  lower  bound  the  square  root  of  the  variance  lower 
bound. 

To  return  to  our  previous  example,  suppose  f  =  In,  m  =  10,  M  =  100,  ti  =  23,  and  also  suppose  a  - 10.  Then 
the  bounds  on  the  sum  of  squares  are 

tangent:  629  *  (1/23)2  +  2  *  23  *  (1/23)  *  [ln(23)  -  23  *  (1/23)] 
+  [ln(23)  -  23  *  (1/23)]2  =  1.19  +  4.28  +  4.57  =  10.04 

secant:  ($  =  ln(100/10)/(100-10)  =  .02558;  hence  bound  is 
(.02558)2  *  629  +  2  *  23  *  .02558  *  [ln(10)  - 10  *  .02558]  +  [ln(10)  - 10  *  .02558]2 
=  .412  +  2.409  +  4.189  =  7.010 

Now  since  the  bounds  on  the  mean  are  2.635  and  3.135  from  our  analysis  in  section  3,  the  bounds  on  the 

square  of  the  mean   are  6.95   and   9.82.      Hence   bounds  on   the   variance  are   10.04-6.95  —  3.09  and 

7.01-9.82  =  -2.81,  and  bounds  on  the  standard  deviation  arc  thus  73.09  =  1.76  and  0. 

4.2.  Special  standard-deviation  bounds  lines 

To  bound  the  standard  deviation  of  the  transformed  values  we  can  use  different  bound  lines  than  for  die 
mean.  First,  let  us  assume  we  know  an  exact  value  for  the  mean  of  the  transformed  data  values  -  call  it  cp. 
Distance  from  <p  to  each  transformed  data  value  is  what  needs  to  be  linearly  bounded,  so  we  use  secants 
through  f(x)  at  <p  (sec  figure  4-1).  We  assume  f(x)  is  monotonic,  and  hence  f  (<p)  is  unique,  so  let  F  (<p)=i> 
(i.e.,  <p  =  f[f)).  So  to  get  an  upper  bound  on  the  standard  deviation  of  the  transformed  values,  we  use  a  line 


II) 


below  f(x)  for  x<//,  and  above  for  x>/>;  and  to  get  a  lower  bound,  a  line  above  f(x)  for  xO,  and  below  for  x></. 

(Vice  versa  for  a  monotonically  decreasing  fix).)    Now  since  we  assume  f(x)  has  a  constant-sign  second 

derivative  in  the  interval,  the  line  segment  from  m  to  v  must  lie  constantly  to  one  side  of  fl(x),  and  similarly  the 

line  segment  from  v  to  M.  Hence  choose  the  extensions  of  those  two  line  segments  into  lines  as  our  bounds 

lines.  These  lines  have  equations 

y  =  (x-p  )(H»-f(m))/(r-m)  +  fl>) 
v  y  =  (x-i>)(f(M)-H»)/(M-f)  +  IV) 

Now: 

o)  =  F[(y-n»)2] 

And  if  y  =  m(x-j>)  +  fi»  this  is: 

E[[m(x-iO  +  fi»  -  fi»]2]  =  E[m2(x-")2] 
-  m2[o2  +  (v-ii)2] 

Hence  using  the  formula  for  the  variance,  the  second  moment  about  the  mean,  the  variance  of  the 
transformed  values  is  bounded  by 

[a2  +  (r-Ja)2][(^-fIm))/(Ju-m)]2and 

and 

[a2  +  (^)2][(fl(M)-*)/(M-,)]2 

Hence  the  standard  deviation  is  bounded  by 

V[a2  +  (*<-iti)2]  [(n»-f(M))/(*>-M)]  and 
V[a2  +  (^M)2][(f(r)-f(m))/(^m)] 

They  are  upper  and  lower  bounds  respectively  for  curves  with  positive  second  derivative,  and  vice  versa  for 

negative  second  derivative.  Hence  the  bounds  are  just  an  "adjusted"  standard  deviation  of  die  original  values 

times  the  slopes  of  the  lines  from  the  mean  of  the  transformed  values  to  the  minimum  and  maximum  on  the 

interval. 

Note  since 

of(p)  is  between  <j[(f(ju.)-f(m))/(/i-m)] 

and  a[(f(M)-f(/i))/(M-/i)],  for  f'(x)  constant-sign 

a  rough  approximation  of  the  standard  deviation  of  die  transformed  values  (as  opposed  to  bound)  may  always 

be  obtained  from  of(i>),  and  this  will  be  increasingly  good  an  approximation  as  cr  gets  smaller.  Also  note  that 

for  a  narrow  range  of  mean  bounds,  the  difference  between  our  standard  deviation  bounds  is  a  rough 

approximation  of  the  second  derivative  of  fat  v. 

S3  a[(f(M)-f(r))/(M-r)]  -  a[(H>)-f(m))/(j>-m)]  S3  2crf» 

So  the  width  of  the  bounds  varies  proportionately  with  the  magnitude  of  die  second  derivative  at  the  mean  of 

the  transformed  values. 


1,1 


™  X* 


t\ 


Figure  4"! :    Linear  bounds  on  the  standard  deviation  of  transformed  values 
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4.3.  Handling  inexact  transform  means 

But  this  assumes  we  know  v,  the  mean  of  the  transformed  values,  exactly.  We  do  for  the  square  function, 
for  instance.  Otherwise  there  is  an  adjustment  we  can  make.  Let  the  upper  and  lower  bounds  on  the  value  v 
which  maps  to  the  transform  mean  be  v{  and  v^.  rITien  the  bounds  on  die  variance  of  the  transformed  values 
are 

max[maxp  <^  [a24>-r)2][(f^)-f(m))/(,-m)]2, 

max,  <„<„  [a2+(li-p)2]mP>m))/(p-M)]2] 

andmin[min    <v<v  [a2+(/i-^)2][(f(v)-f(m))/(j/-m)]2, 

min,  <„<„   k2  +  (/i-^)2][(f^)-f(M))/(r-M)]2] 
Since  max(max(g(x)*s(x)),max(h(x)*s(x)))  =  max(max(g(x)*s(x),h(x)*s(x)))  =  max(max(g(x),h(x))*s(x)),  we 
can  simplify: 

max^   ^p    [max[(f(^)-f(M))/(J/-M)J2,(f(^)-f(m))/(r-m)]2]*[a2  +  (M-f)2]] 

and  min^  ^^   [min[(f(^)  f(M))/(^M)]2,(f(^)-f(m))/(^m)]2]*[a2  +  (Ja^)2]] 

First,  suppose  f(x)  is  monotonically  increasing  (like  all  of  our  six  important  functions  except  1/x).  If  the 
second  derivative  is  positive,  then  the  inner  max  is  the  first  subexpression  in  the  first  bound  above,  and  the 
inner  min  is  the  second  subexpression  in  the  second  bound.  We  can  then  rewrite  die  formulae: 

max  [(f(^)-f(M))/(,-M)]2*[a2  +  (j^)2] 

VL~-   U 

and  min,  <,<,  [(n»-f(m))/(„- m)j2*[a2  +  (/x-^)2] 
Note  that  these  represent  the  product  of  two  functions  which  arc  both  monotonically  increasing  with  respect 
to  v.  For  a  monotonically  increasing  f(x),  /x  is  a  lower  bound  on  v.  The  product  of  two  monotonically 
increasing  functions  is  a  monotonically  increasing  function.  The  max  of  a  monotonically  increasing  function 
is  the  value  at  the  rightmost  point,  and  the  min  is  at  the  leftmost  point.  So  the  revised  bounds  on  the  variance 
of  the  transformed  values,  given  f(x)  increasing  and  with  positive  second  derivative,  are 

upper:  [(fT^u)-f(M))/(^u-M)]2*[a2  +  (/x-ru)2J 

lower:  [(f(i;I)-f(m))/(yi;m)]2*[a2+(JLt^L)2] 

Similarly  if  f(x)  has  a  negative  second  derivative  (again,  assuming  the  first  derivative  is  positive),  we  can  show 
by  analogous  reasoning  that  the  bounds  are: 

upper:  [(fir L)-f(m))/(rL-m)]2*[a2  +  (iti-*'I)2] 

lower:  [(fl[i'u>f(M))/(^v-M)]2*[a2+(/i-i'u)2] 

Using  our  example  of  f  =  In,  m-  10,  M-  100,  [i  -23,  a-  10,  we  use  die  previously  found  linear  bounds  on 
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the  mean  of  the  logarithms  of  yu  =  c3 135  =  23  and  pL  =  e1635  =  13.9.  Hence  bounds  on  the  standard  deviation 
of  the  logarithms  are: 

V[102  +  9.12][(2.635-ln(10))/(13.9-10)]  =   |.  5  6 
V[102  +  02][(ln(100)-3.135)/(100-23)]  =  t]QC. 

both  being  better  than  the  sum-of-squares  bounds  in  section  4.1. 

Unfortunately,  revised  formulae  for  monotonically  decreasing  functions  are  not  as  easy.  The  partial 
derivative  of  the  bounds  expressions  must  be  set  to  zero  and  inverted.  Consider  the  case  for  the  upper  bound 
for  a  curve  with  a  negative  second  derivative  (like  1/x): 

0  =  a/ar[[(f(j/)-f(M))/(^-M)]2*[a2  +  (M-"  )2]] 

0  =  2[(n»-f(M))/(*>-M)J  *  [(f»(r-M)  -  (f(^)-f(M))  /(*>-M)2]  *  [o2  +  {ii-v)2]\ 

+  [(fW-f*M))/(i>-M)r*-20i-*) 
[(f  dO("-M)  -  (n»-f(M))  /  (r-M)2]  *  [a2  +  (^)2]}  =  [(n>H(M))/(r-M)]  *  Qi-p) 

which  is  then  solved  for  i>,  and  the  value  substituted  in  the  function  differentiated  above  to  obtain  the  bound. 

Analogously,  the  other  bound  is  found  by  solving 

[(f»(*-m)  -  (Kr)-f(m))  /  (,-m)2]  *  [a2  +  (/i-*>)2]] 
=  [(fi»-f(m))/(r-m)]  *  (fi"") 

4.4.  Evaluating  standard-deviation  bounds 

The  sum-of-squares  bounds  of  section  4.1  are  hard  to  evaluate,  but  we  can  examine  the  slope-based  bounds 

of  the  last  section,  provided  we  assume  v  is  known  exactly.  We  are  interested  in  knowing  the  largest  possible 

difference  between  the  upper  and  lower  bounds  for  an  exact  r,  or  the  maximum  of 

DC)  =  a2[[(f!M)-f(r))/(M-^]-[(f(r)-f(m))/C-m)]] 
where  a2  =  a2  +  (ix-v)2 

For  four  of  our  functions  -  x2,  x3, 1/x,  and  Vx  -  this  is  straightforward  to  find: 

•  x2:  D(i>)  =  o2[(i>  +  M)-(i>  +  m)]  =  a2(M-m),  so  D  is  constant. 

•  x3:  D(*>)  =  a2[(v2  +  uM  +  M2)  -  (v2  +  vm  +  m2)]  =  a2[v(M-m)  +  (M2-ra2)].    This  has 
maximum  at  v  -  M  of  a2(M-m)(2M-m). 

o  1/x:  D(p)   =    oJl/i>m  -   1/*>M]   =    a2(l/m  -  l/M)/e.      This  has  a  maximum  at  v  =  m  of 
a2(M-m)/m2M. 

•  Vx:  D(v)  =  ct2[(1/(vV  +   VM))  -  (l/(Vv  +  Vm))]  =  a2(VM-y/m)/(p  +  (VM-Vm)^  + 
V(tnM)).  This  has  a  maximum  at  v  =  m  of  a2(l/Vm  -  1/Vlvl). 

For  transcendental  functions  like  ln(x)  and  cx  wc  can  attack  the  problem  with  an  infinite  scries  obtained 
from  the  Taylor  series  expansion  of  the  function  about  v;  when  the  curve  is  relatively  flat  in  the  interval  of 
interest,  the  approximation  will  be  good. 
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EX")  =  a2[((f(f)-f(M)))/(^M)-((f(^)-f(m))/(^-m))] 
I  .ct  us  expand  the  first  quotient  in  the  brackets  into  a  series. 

(fl»-f(M))/(»>-M)  =  [f(r)  -  [f(i>)  +  (i>-M)f»  +  (*<-M)2f»/2!  +  ...  ]]  /  (*-M) 
-[f»  +  (j/-M)f"<>)/2!  +  (f-M)2r»/3!  +  ...  ] 

'i=lt0oo 
Hence 


•2,-,lftM[HDH^)/i'] 


vD(^)-a2i=lto00[[^-m)i"1^)/i!]-[("-M)i-1fi(r)/i!]] 
We  need  to  take  the  derivative  with  respect  to  v  of  this  in  order  to  sec  if  it  has  a  maximum  in  the  interval.  The 
condition  for  the  maximum  is  thus: 

0  =  2.  =  lto00[[(^m)i-1>-M)i-1]fi+1(0/(i+l)*(i-l)!)] 

To  approximate  this  we  can  take  the  first  few  terms: 

0  =  (M-m)f»/2!  +  (2r(M-m)-(M  +  m)(M-m))f»/3! 
0  =  (M-m)[f  »/2  +  (2#/-m-M)f"(")/61 

As  an  example,  consider  f{x)  =  ex.  Then: 

0  =  (M-m)[e"/2  +  (2//-m-M)e"/6]  =  (M-m)e"(l/2  +  v/3  -  m/6  -  M/6) 

which  can  be  solved  iteratively  for  v. 


5.  Quadratic  bounds  on  means:  Taylor-series  methods 

5.1 .  The  problem 

A  straight  line  is  not  a  very  good  approximation  to  a  function  with  a  strong  curvature.  An  obvious  next 
step  to  improve  our  estimates  of  the  mean  is  to  constnict  quadratic  bounds  lines  of  the  form  y  =  ax  +bx  +  c 
and  compute  the  mean  along  those: 

E[ax2  +  bx-f  c]  =  a(a2  +  /i2)  +  b/x  +  c 

However,  finding  quadratic  bounds  curves  is  not  as  easy  as  it  might  seem.  We  generally  cannot  just  use  the 
Taylor  series  about  some  point  of  the  curve,  as  with  the  estimates  (not  bounds)  of  [9],  because  while  such 
approximations  may  stay  close  to  the  curve  of  the  actual  function  on  some  range,  they  may  be  above  and 
below  it  at  different  places.  For  instance,  take  the  3-tcrm  Taylor  scries  for  f(x)  =  ln(x)  about  x=  1,  which  is 

0  +  (x-l)*(l/l)  +  (x-l)2*(-l/l2)/2  =  -.5x2  +  2x  - 1.5 
At  x  =  2  this  is  .5,  below  the  logarithm  curve  value  ln(2)  =  .69,  but  at  x=  .5  this  is  -.625,  above  the  logarithm 
curve  value  ln(.5)  =  -.69.  Hence  the  approximation  curve  crosses  ln(x),  and  cannot  be  used  as  a  bound  on  the 
values  of  the  latter. 
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5.2.  Quadratic  bounding  by  vertical  shifting 

There  is  a  way  wc  can  use  arbitrary  polynomial  approximations  to  get  bounds:   we  can  shift  the 

approximation  curve  upwards  or  downwards  until  it  no  longer  crosses  the  target  curve  in  the  interval.  To  put 

this  formally  for  the  Taylor  series,  we  want  to  bound  fljx)  on  the  interval  m  to  M  by  the  function 

h(x)  =  flCt)  +  (x-t)f(t)  +  .5f'(t)(x-t)2  +  K 

where  t  is  some  arbitrary  point  in  the  interval,  and  K  is  some  constant.    If  we  choose  t=u  (for  quadratic 

bounds  a  convenient,  but  not  necessarily  best-bound  point),  then  the  mean  of  the  approximation  function  is 

E[h(x)l  =  u»  +  (jti-/x)f (t)  +  .5[a2  +  n2  -  2/x2  +  /x2]  f'(t)  +  K 
=  fiii)  +  .5a2f"(0  +  K 

If  we  do  not  choose  t=/x  the  formula  is  slightly  more  complicated: 

fit)  +  Oi-t)f(t)  +  .5(a2  +  (/i-t)2)f"(t)  +  K 
Note  for  the  particular  function  f(x)  =  x   the  Taylor  series  has  only  three  terms,  and  hence  an  exact  formula 
for  the  mean  of  the  square  of  a  set  of  data  values  is 

H2  +  .5a2(2)  =  n2  +  a2 

The  lower  and  upper  bounds  arc  then  found  from  substituting  K..  and  KL,  which  are  respectively  the 

maximum  and  minimum  values  in  the  interval  of  study  of  the  error  of  the  approximation  e(x),  defined  as 

e(x)  =  f(x)  -  f(t)  -  (x-t))f  (t)  -  .5(x-t)2f"(t) 

Since  the  interval  is  finite,  we  cannot  just  find  die  zeros  of  the  derivative  of  e(x).  Zeros  have  to  lie  within  die 

data-value  interval,  and  they  must  be  compared  to  two  other  points,  the  function  values  at  the  maximum  and 

minimum  of  the  range.  In  other  words: 

Ku  is  max[c(m),e(M),  e^),  e(z2), ...] 
KL  is  min[c(m),e(M),  e(z1),  c(z2), ...] 

where  die  z.  arc  all  zeros  of  e'(x)  within  die  interval.  To  find  the  zeros: 

i  v  ' 

3e/3x  =  f (x)  -  f(t)  -  (x-t)f '(t)  =  0 
[f(x)-f(t)]/(x-t)  =  f'(t) 

We  always  know  one  solution  of  the  above  equation,  x-- 1,  because 
[f (t)  -  f(t)l  =  (t-t)  f '(t)  -  0 
But  dicrc  arc  no  other  solutions  for  functions  with  constant-sign  derivatives,  implying  no  other  local  maxima 
or  minima  for  a  Taylor-series  approximation.  To  sec  diis,  note  the  equation  says  the  slope  of  f  (x)  from  t  to 
some  other  point  must  be  equal  to  the  derivative  of  f  (x)  at  t.  But  this  cannot  occur  if  the  second  derivative  of 
f(x)  (i.e.,  f  "(x))  is  constant  in  sign,  because  then  each  value  of  the  first  derivative  (i.e.,  f  '(x))  can  occur  at  most 
once. 

Hence  we  can  write  die  Taylor-scries  quadratic  bound  in  general  as  (noting  c(ju)--O): 
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upper  bound:  fit)  +  (jti-t)f(t)  +  .5(tr2  +  (/x-t)2)f"(t)  +  max(e(m),c(M),0) 
lower  bound:  f(t)  +  (/i-t)f(t)  +  .5(a2  +  (/z-t)2)f"(t)  +  min(c(m),c(M),0) 

For  particular  functions  f  we  may  be  able  to  rule  out  some  possibilities  for  the  min  and  max.  For  instance, 
for  f(x)  =  x3,  c(x)  is  just  the  fourth  Taylor-scries  term,  (x-t)3*6/6,  so  e(M)  >  0  and  e(m)  <  0,  and  bounds  are 

upper  bound:  t3  +  (ju-t)*3t2  +  .5(a2  +  (ju-t)2)*6t  +  (M-t)3  =  3t2(M-/i)  +  3t[a2  +  jn2-M2]  +  M3 
Mower  bound:  t3  +  (/x-t)*3t2  +  .5(a2  +  (ju-t)2)*6t  +  (m-t)3  =  3t2(m-/x)  +  3t[<j2  +  /A2-m2j  +  m3 

Similarly,  e(m)<0  from  analyzing  tlie  Taylor  series  for  logarithm  and  square  root;  0<e(m)  for  reciprocal;  and 

(Ke(M)  for  antilog. 

5.3.  An  example 

To  illustrate,  use  our  previous  example  of  f=ln,  m  =  10,  M  =  100,  t  =  /i  =  23,  and  a  =  10.  Take  the  Taylor 
series  about  ju.  From  the  preceding  we  know  that  the  only  possible  extremes  occur  at  m,  M,  and  /a,  so  note: 

e(x)  =  ln(x)  -  [ln(23)  +  (x-23)/23  -  .5(x-23)2/232] 

e(m)  =  ln(10)  -  [3.14  -  .56  -  .16]  =  2.30  -  2.42  =  -.12  =  KL 

e(ji)  =  ln(23)  -  ln(23)  =  0 

e(M)  =  ln(100)  -  [3.14  +  3.35  -  5.6]  =  4.6  -  0.9  =  3.7  =  K(J 

Which  arc  the  bounds  offsets  we  have  to  add  to  the  estimate  of  the  mean  of 

ln(23)  -  .5  102/232  =  3.06 

So  we  estimate  the  mean  of  the  logarithms  is  3.06,  with  an  upper  bound  of  3.06  +  max(-.  12,0,3.7)  =  6.76,  and 

a  lower  bound  of  3.06  4-  min(-. 12,0,3.7)  -  2.94.    The  upper  bound  is  much  worse  than  the  linear  upper 

bound  (3.135),  but  the  lower  bound  is  better  than  the  linear  lower  bound  (2.635). 

5.4.  Choosing  the  optimal  point  for  the  Taylor  series 

The  question  arises  as  to  the  best  value  oft  for  getting  an  upper  or  lower  bound.  Analysis  requires  careful 
preconditions,  but  we  can  often  do  something  like  this.  Suppose  that  e(M)  is  the  maximum  value  of  c(x)  on 
the  interval  of  study.  The  estimate  of  the  transformed  mean  from  taking  the  Taylor  series  about  t  is 

f(t)  +  (/x-t)f(t)  +  .5[a2  +  (/i-t)2]r(0 

-  f(t)  +  (/x-t)f(t)  +  .5[a2  +  (M-t)2]f"(t)]  +  [f(M)  -  f(t)  -  (M-t)f(t)  -  .5(M-t)2f'(t)] 

=  flJM)  +  (/i-M)f(t)  +  .5[a2  +  ju2-M2-2/it+2Mt)]f(t)] 

We  want  to  minimize  this  maximum  error  with  respect  to  t,  i.e.  we  want: 

0  =  9/91  [f(M)  +  Q.i-M)f(t)  +  .5[a2  +  ju2-M2-2jit+2Mt)]f"(t)l 
0  =  (/jt-M)f'(t)  +  .5[(j2  +  Ja2-M2-2Jut  +  2Mt)]fv"(t)]  -h  (M-/x)f'(t) 
0  =  .5[ff2  +  ju2-M2-2/it+2Mt)]r(t) 

For  a  function  with  derivatives  constant  in  sign,  this  can  only  be  zero  if  the  expression  in  brackets  is  zero: 

0  =  a2  +  /x2-M2-2/U-h2Mt 
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t  =  [ju  +  M  -  8M]  /  2,  where  8     =  ct2/(M-ju) 


t  =  [a2  +  ^2-M2]/2(/i-M) 

y/2,whu„„M 

Hence  substituting  back  in  the  expression  for  the  bound,  the  second  derivative  term  must  disappear,  and  we 

get 

fl(M)  +  (/i-M)f(0i+M-fiM)/2) 

which  is  an  upper  bound  provided  e(M)>0  and  e(M)>e(m). 

By  similar  analysis  we  can  show  that 

t  =  [a  +  m  +  8   1/2,  where  5     =  o  /(u-m) 
is  the  best  t  for  obtaining  the  other  bound  on  the  e(x)  on  the  interval  of  interest,  leading  to  a  lower  bound  of 

f(m)  +  (/i-m)f(0i+m+Sm)/2) 
provided  e(m)<0  and  e(m)<e(M).     For  a-0  the  upper  and  lower  bounds  occur  at  t  =  (ju.  +  M)/2  and 
t=(/x  +  m)/2  respectively;  and  for  a  the  maximum,~y[(M-/i)(ju-m)],  these  are  both  (M-m)/2. 

So  for  the  logarithm  function  (where  e(m)<0  necessarily)  /x  =  23,  m  =  10,  and  M=  100,  and  this  gives  for  a 
lower  bound  for  t  =  (23  +  10  +  .5*100/(23-10))/2  =  20.3,  and  the  bound  is 

f(m)  +  (/i-m)f(20.3)  =  ln(10)  +  13/20.3  =  2.30  +  .640  =  2.94 
which  is  negligibly  better  than  for  the  series  about  ju,  but  may  represent  an  improvement  in  other  cases.  In 
general,  the  Taylor  series  approach  works  well  for  narrow  intervals  of  interest  or  intervals  where  f(x)  is  rather 
flat.  We  can,  however,  use  order  statistics  to  improve  Taylor-series  bounds;  sec  section  10. 

6.  Quadratic  bounds  on  means  from  Lagrange  interpolation 

Taylor  scries  approximations  deteriorate  on  the  edges  of  an  approximation  interval.     We  are  more 

concerned  with  signed  maximum  deviation  of  the  approximation  from  die  function  (a  concept  distinct  from 

the  Lqq  approximation,  which  minimizes  the  absolute  value  of  deviations),  and  a  better  quadratic  for  our 

purposes  comes  from  Lagrange  interpolation  method  using  the  Chebyshev  interpolation  points.     For  a 

quadratic  we  need  three  points  to  fit  the  curve  dirough,  giving: 

h(x)  =  f(p)(x-q)(x-r)/(p-q)(p-r)  +  f(q)(x-p)(x-r)/(q-p)(q-r)  +   f(r)(x-p)(x-q)/(r-p)(r-q) 

h(x)  =  (8/3(M-m)2)[f(pXx-q)(x-r)-2f(q)(x-p)(x-r)  +  f(r)(x-p)(x-q)] 

where  p  =  m  +  (.5- V3/4)(M-m),  q  =  (M-f-m)/2  ,  and  r  =  m  +  (.5+  V3/4)(M-m) 

Using  our  example  of  f  =  In,  m  =  10,  M  =  100,  n  =  23,  and  a  — 10,  we  have: 

p = 16.029',  q  =  55.0,  r =93.971  ;1n(p)= 2.7744,  ln(q)= 4.0073,  ln(r)= 4.5430 
h(x)  =  -.0002295x2  +  .04794x  +  2.0648 

Hence  an  estimate  of  the  mean  of  the  logarithms  for  this  example  is 

-.0002295(102  +  232)  +  .04794(23)  +  2.0048  -  -.1444  -(-  1.1026  +  2.0648  =  3.0230 
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This  is  an  estimate,  not  a  bound.   Just  as  with  Taylor-scries  polynomials,  we  can  get  bounds  from  this  from 

knowing  the  cxtrcma  (maxima  and  minima)  of  the  error  curve  on  the  interval  of  interest.  For  Chcbyshcv  (as 

opposed  to  Taylor-series)  approximations  there  arc  two  places  in  the  interval  where  e(x)  =  0,  and  hence  one 

local  maximum  and  one  local  minimum.   We  can  find  these  by  solving  the  error  curve  derivative  explicitly;, 

for  logarithm  and  cube  this  is  a  quadratic  equation,  for  square  root  and  reciprocal  a  cubic,  and  for  exponential 

a  transcendental  equation.  For  example,  for  our  ln(x)  example: 

d/dx[ln(x)  -  (-.0002295x2  +  .04794x  +  2.0648)]  =  1/x  +  .000459x  -  .04794  =  0 

hence  .000459x2  -  .04794x  +  1=0 

andx  =  [.4794  ±  V(.047942-.001836)]  /  .000918  =  28.80  and  75.64 

So  the  extrema  of  c(x)  on  the  interval  can  occur  at  only  four  points:    m  =  10,  M  =  100,  28.80,  and  75.64. 

Computing  c(x)  there: 

e(10)  =  -.2187,  c(100)  =  .04137,  e(28.80)  =  .10526,  e(75.64)  =  -.05193 

And  hence  the  Lagrange-Chebyshev  quadratic  bounds  on  the  mean  of  the  transformed  values  are: 

upper  bound:  3.0230  +  max(-.2187,  .04137,  .10526,  -.05193)  =  3.1283 
lower  bound:  3.0230  +  min(-.2187,  .04137,  .10526,  -.05193)  =  2.8043 

which  are  better  than  the  linear  bounds  of  3.135  and  2.635  (and  hence  the  Taylor  series  bounds  too). 

7.  Quadratic  bounds  on  means:  one-sided  methods 

There  are  quadratic  methods  that  avoid  having  to  find  the  extrema  of  the  error  function  in  computing  an 
approximation,  by  constructing  approximation  curves  entirely  above  or  entirely  below  the  target  function  in 
the  interval.  We  can  do  this  if  we  can  position  the  points  of  intersection  of  the  approximation  curve  ax  +  bx 
+  c  with  f(x)  to  lie  either  (a)  outside  the  interval,  or  (b)  tangent  at  some  point.  Among  our  six  demonstration 
functions,  reciprocal  and  cube  lead  to  cubic  polynomial  equations. 

7.1 .  Intersection  and  tangent  positioning:  reciprocal 

Consider  reciprocal  first.  The  error  curve  is 

e(x)  -  1/x  -  ax  -  bx  -  c 
and  it  can  have  at  most  three  zeros  which  are  the  solutions  to 

0  =  ax3  +  bx2  +  ex  -  1 
To  keep  the  approximation  curve  "close",  we  can  put  a  point  of  tangency  at  some  t  inside  the  interval  -  i.e.,  a 
double  zero  at  t  -  and  another  zero  at  M.  We  can  write  this  function  as  e(x)  =  (x/t  -  l)2(x/M  -  1),  which 
approaches  -co  for  small  x,  +  eo  for  large  x,  reaches  a  local  maximum  at  x  =  t,  a  local  minimum  at  some  larger 
x  value,  and  then  crosses  zero  permanently  at  x  -  M.  Then  we  want 

(x/t  -  l)(x/t  -  l)(x/M  -  1)  =  ax3  +  bx2  +  ex  - 1 


x3/t2M  -  x2(2/tM  +  1/t2)  +  x(2/t  +  1/M)  -  1  =  ax3  +  bx2  +  ex  -  I 
a  =  l/t2M,  b  =  -(2/tM .+  1/t2),  c  =  2/t  +  1/M 
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So  the  quadratic  lower  bound  on  the  mean  is 

(o2+n2)/l2M  -(2/tM  +  l/t2)/x  +  2/t  +  1/M 

We  are  interested  in  the  best  lower  bound  possible,  i.e.  the  largest.  We  can  find  this  by  setting  to  zero  the 
partial  derivative  of  the  preceding  with  respect  to  t: 

0  =  -2(a2  +  /Lt2)/t3M  +  (2/t2M  +  2/t3)/*  -  2/t2 
0  =  -(ct2  +  ^2)/M  +  (t/M  +  l){i  -  t 
\  =  [/i-(a2  +  ju2)/M]/(l-/i/M) 

=  p.  -  a2/(M-/i)  =  n  -  SM  where  5M  =  a2/(M-/x) 

So  for  a  =  0  this  is  ju.;  for  o  a  maximum,  namely  V[(M-ju)(jU-m)]  (see  section  3.6),  this  is  m.  We  saw  this  8M 

term  before  in  a  different  kind  of  quadratic  approximation  in  section  5.4. 

Substituting  this  t  in  the  bound  formula,  we  get  a  quadratic  lower  bound  of 

[(a2  V-M/x)  +  2(/i-6M)(M-/i)  +  (m-5m)2]  /  M(jx-8 H)2 
=  1/M  +  [(a2  +  JLi2-Mfi)  +  2(MJli-ju2-<71)]  /  M[(/i  -  aV(M-jti))2] 
=  1/M  +  [M/x  -  a2  -  jit2]  /  M[(M/i  -  a2  -  ju2)  /  (M  -  ju]2 
=  1/M  +  [(M  -  ju)2  /  M(M/i  -  a2  -  /i2)] 
=  (1/M)  [  [M/x  -  a2  -  jli2  +  M2  -  2M/i  +  /it2]  /  (M/i  -  a2  -  jli2)] 
=  (1/M)  [M2  -  M/x  -  a2]  /  [M/i  -  /i2  -  a2] 
=  (l/M)(M-fiM)/(/i-fiM) 

Note  that  when  a  =  0  this  is  equal  to  1/M  *  M  /  jn  =  1/jli,  the  linear  bound.  Since  jli<M,  a  nonzero  a 
will  cause  the  denominator  of  die  fraction  to  decrease  proportionately  more  than  the  denominator,  and  hence 
give  a  lower  bound  greater  (better)  than  the  linear  lower  bound.  The  maximum  value  of  a  is  V[(M-/j.)(/x-m)], 
whereupon  5M  =  ju-m,  and  die  lower  bound  is  1/M  *  [M  -  [i  +  m]  /  m  =  1/m  +  1/M  -  ju/mM,  exactly  the 
upper  linear  bound  for  reciprocal  (sec  section  3.2). 

Again,  let's  use  our  standard  example  of  m  =  10,  M=100,  /z  =  23,  o  =  10,  this  time  for  the  reciprocal 
function.  Then 

6M  =  102/(100-23)  =  1.299 
And  a  lower  bound  on  the  mean  of  the  reciprocals  is 

1/100  *  (100  -  1.299)  /  (23  -  1.299)  =  .04548 
This  is  better  than  the  linear  lower  bound,  calculated  as  1/jli  =  .0435. 

We  can  get  an  upper  quadratic  bound  by  only  minor  modifications:  just  create  a  bounding  curve  diat 
crosses  1/x  at  m  instead  of  M,  and  is  tangent  at  t  in  the  interval.  We  just  substitute  m  for  M  in  the  preceding 
formulae,  giving 

an  upper  bound  of  (a2f/i2)/t2m  -  (2/tm  +  i/t2)jLt  +  2/t  T  1/m 


?.() 


taken  at  t  =  ju,  +  a  /(ti-m)  =  /i  +  6 

whicli  can  be  written  as  (1/m)  (in  +  5   )  /  (/x  +  5   )  where  5     =  a  /(ti-m) 

So  for  our  example  data,  t  =  23  +  102/(23-10)  =  30.69,  and  die  upper  bound  is  1/10  -  13/10*30.69  =  .0576. 
This  is  significantly  better  than  the  linear  upper  bound  of  (77/90)*.l  +  (13/90)*.01  =  .0871.  Hence  by  using 
a  quadratic  rather  than  linear  bound  we  have  narrowed  the  range  of  the  answer  by  a  factor  of 
(.0576-.0455)/(.0871-.0435)  =  .278. 

7.2.  Evaluation  of  the  quadratic  reciprocal  bounds 

We  can  obtain  useful  approximations  of  die  quadratic  bounds  by  replacing  the  quotient  with  the  first  few 
terms  of  its  binomial  expansion,  as  here  for  die  lower  bound: 
l~  w..  .,   x    /..2  ,   *2  ,3 


-cel/M(M-8M)(ii-8My1^l/l 

l//x  +  6M(l/ti-l/M)//x  +  6^1/ii- l/M)/|ti' 


hence  1/M  (M  -  5M)(/i  -  8M)'L  ~  1/ju  +  (l//zz  -  l/Mti)fiM  +  (1//*  '  1/M/*  )5m 


=  l//x  +  a2/M/n2  +  a4/(M-/x)Mii3 

Hence  the  difference  between  the  quadratic  bounds  can  be  approximated  by 

(1/m  -  1/M)ct2/ju2  +  (l/m(m-/i)  -  1/M(M-Ju))a4/Ju3 
=  [(M-m)a2//a2]  [1/mM  +  (m  +  M-Ju)CT2//imM(m-/x)(M-ii)] 

As  suggested  in  the  previous  section,  the  quadratic  bounds  are  always  better  than  the  linear  bounds  except  at 

the  two  extreme  cases  of  o.    We  can  find  the  ju.  and  a  for  which  they  are  least  accurate.    Set  the  partial 

derivative  of  die  difference  between  die  quadratic  bounds  to  0: 

0  =  3/3/1  [(1/m  -  l/M)a2/it2  +  (l/m(m-ii)  -  l/M(M-/*))a%3] 

0  =  -2(l/m  -  1/M)ct2//x3  +  [l/m(m-/x)2  -  1/M(M-ii)2]<j4//i3 

+  -3[l/m(m-/i)  -  l/M(M-ju)]a4//x4 

2(l/m  -  1/M)  -  [i/m(m-ii)2  -  l/M(M-ja)2]a2  -  3[l/m(m-/x)  -  l/M(M-ju)]a2/ju 

which  can  be  solved  itcratively. 


7.3.  Intersection  and  tangent  positioning:  cube 

We  can  do  something  similar  for  the  cube  function: 

c(x)  =  x  -  ax  -  bx  -  c 
which  is  a  third-degree  polynomial  just  like  the  one  for  reciprocal.  So  we  can  position  one  intersection  point 
and  one  tangency  point.  This  time  we  can  write  e(x)  as 

e(x)  =  (x-t)2(x-M)  =  x3  -  ax2  -  bx  -  c 
hence 


a  =  2t  +  M,  b  -  -(t2  +  2tM),c  =  t2M 
so  an  upper  bound  on  the  mean  is 
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(2t+M)(a2  +  Ju2)-(t2  +  2tM)Ju  +  t2M 
and  this  is  a  minimum  when  we  choose  a  t  such  that 

2(a24-/i2)  -  (2t  +  2M)/i  4  2tM  =  0 
[(a24-/i2)-M/i]/(/i-M)  =  t 
t  =  /i  -  ct2/(M-/i)  =  /i  -  8M 

Substituting  this  in  the  equation  for  the  bound: 

*  (/x-V2(M-M)  +  2(/i-8iV1)(a2  +  /i2-M/i)  +  M(a2  +  /x2) 

=  yrU  -  /i3  -  2/i6MM  +  2M25M  +  fi    2M  -  «A  +  2/ia2  +  2/i3 
2/i2M  -  2«Ma2  -  25mM2  +  2«M/iM  +  a2M  4-  ^M 
=  /i3  4-  5    2(M-/i)  +  (2/x  +  M-25M)a2 
=  /i3  +  a7(M-fi)  +  (2/i  +  M  -  2a2/(M-/i))(T2 
=  /i3  -  a4/(M-/i)  4-  (2/i4-M)a2 


^  +  (2M  +  M  -  fiM)a 

Similarly,  a  lower  bound  is 

(2t+m)(<r2+/i2)-(t2+2tm)/i  +  t2m 
and  this  is  a  maximum  when  we  choose  a  t  such  that 

t  =  ju  4-  a2/(/i-m)  =  [x  +  8m 
leading  to  a  lower  bound  of 

/i3  +  (2/i  +  m  +  5m)a2 
Note  the  quadratic  lower  bound  is  always  greater  than  the  linear  lower  bound,  /i  .  The  difference  between 
the  upper  and  lower  bounds  is 

[M-m-SM-<5   ]a2 
which  provides  a  useful  criterion  for  the  effectiveness  of  these  bounds.  Note  this  is  always  nonncgativc  since 

M  -  m  -  [8M  4-  fij  =  M  -  m  -a2(M-m)/(M-/i)(/i-m) 
=  (M-m)[l  -  ff2/(M-/i)(/i-m)] 

Tlie  largest  possible  value  of  a   is  (M-/i)(/i-m),  so  the  quantity  in  brackets  is  always  nonnegative. 


8.  Optimal  quadratic  bounds 

The  problem  of  finding  the  best  quadratic  approximation  for  our  bounding  purposes  may  be  viewed  as  an 
optimization  problem  in  two  variables.  Since  the  quadratic  curve  ax    +  bx  +  c  leads  to  a  bound  of 

upper  bound:  a(or  +/i  )  +  b/i  +  c  +  max   <  <Mff(x)-ax  -bx-c] 
lower  bound:  a(cr  +/i  )  4-  b/i  +  c  +  min    7 '  ~  [f(x)-ax  -bx-c] 

and  die  constant  c  can  be  moved  out  of  the  maximum  and  minimum,  we  can  write: 

upper  bound:  a(a2  +  /.i2)  4-  b/i  4-  maxm<x<M[f(x)-ax  -bx] 
lower  bound:  a(cr  4-/i  )  4-  b/i  4-  min    ~  ~  [f(x)-ax  -bx] 

So  we  have  two  optimization  problems  for  real  a  and  b:  to  find  the  values  that  minimize  the  upper  bound, 
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and  the  values  the  maximize  the  lower  bound.  We  have  constructed  a  program  that  docs  this  by  estimating 
the  gradient  from  exploratory  steps,  finding  the  zeros  of  the  error  function  by  the  quadratic  formula  for 
logarithm  and  cube,  and  by  iterative  bisection  for  antilog,  square  root,  and  reciprocal.  Comparison  with  the 
other  obtained  bounds  is  presented  later  in  this  paper.  Unfortunately,  die  cxtrcma  appear  to  be  "broad",  and 
convergence  is  slow,  so  the  other  methods  discussed  in  this  paper  seem  clearly  desirable  in  most  cases.  While 
these  other  methods  cannot  usually  get  the  tightest  bounds,  the  difference  is  usually  not  much. 

A  strong  local  maximum  found  by  the  optimization  process  is  guaranteed  to  be  the  global  maximum  over 
all  quadratic  curves,  because  the  function  being  optimized  is  convex.  To  see  this,  note  for  the  upper  bound 
for  instance 

(dal  +  (l-^)a2)(a2  +  JLt2)  +  (0b  T  +  (l-0)b2)/x 

+  maxm^x^M[f(x)-(^a1  +  (l-^)a2)x2-(^b1  +  (l-^)b2)x] 

<  al{a1+a1)  +  blM  +  maxm£x£M[f(x)-a1x2-b1x] 
+  a2(a  +/i  )  +  b2ju  +  maxm^x^M[f(x)-a2x  -b2x] 

since  max(f(x)  +  g(x))  <  max(f(x))  +  max(g(x)). 

For  our  standard  example,  we  found  the  optimal  quadratic  bounds  to  be  3.00  and  3.10. 

9.  Improving  accuracy  with  outliers  and  statistics  on  subsets 

We  can  tighten  bounds  if  we  know  additional  information  about  a  set  of  data  values.  We  may  know  a  few 
extreme  values  on  die  range  (outliers),  and  be  able  to  remove  these  points  from  the  analysis  of  the  rest  of  the 
points.  This  helps  a  good  deal  when  m  and/or  M  arc  unusually  unrepresentative  of  die  distribution  (and 
notice  how  frequently  we  have  used  m  and  M  in  our  formulas).  With  the  outliers  removed,  the  remaining 
values  can  have  a  narrower  range,  on  which  the  function  can  be  better  matched  by  a  linear  or  quadratic 
approximation.  The  transformed  values  for  die  known  outliers  can  then  be  added  to  the  total  mean  or  total 
variance  in  a  final  step. 

But  we  can  generalize  tliis.  We  can  improve  accuracy  of  bounds  any  time  we  know  means  and  variances  of 
arbitrary  subsets  of  the  original  data  values.  We  may  then  estimate  statistics  on  the  transformed  values  for 
each  subset  and  combine  them  with  the  appropriate  weighting. 

9.1 .  An  example 

For  instance,  from  [8],  dicrc  were  6133  merchant  ships  with  United  States  registry  in  1982,  of  an  average 
gross  tonnage  of  3120  per  ship.  Of  these,  2941  were  fishing  vessels,  of  average  tonnage  199.6  gross  tons;  548 
were  cargo  ships,  of  average  tonnage  9790  tons;  361  were  tankers,  of  average  tonnage  2670  tons.  Hence  there 
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were  6133  -  2941  -  548  -  361  =  2283  other  ships  of  average  tonnage  [(6133*3120)  -  (2941*200)  -  (548*9740) 
-  (361*2670)]/2283  =  [19,130,000  -  588,00  -  5,340,000  -  965,000]  /  2283  =  5320  tons. 

Now  suppose  we  want  the  mean  of  the  logarithms  of  the  tonnage  values.  Consider  die  upper  bounds  on 
each  of  die  four  disjoint  subsets.  These  arc  just  die  logarithms  of  the  means,  or  5.30,  9.21,  7.88,  and  8.57. 
Hence  the  total  upper  bound  is  die  weighted  mean  of  these  upper  bounds,  or 
[(5.30*2941)  +  (9.21*548)  +  (7.88*361)  +  (8.57*2283)]  /  6133  =  7.018.  This  should  be  compared  with  the 
upper  bound  derived  from  the  mean  of  die  entire  set,  ln(3120)  =  8.03,  so  the  subdivision  data  gave  us  a 
significant  improvement. 

Unfortunately,  we  do  not  know  anything  about  die  maximum  and  minimum  tonnage  of  classes  of  ships,  so 

we  cannot  get  a  cumulative  lower  bound.   However,  we  know  m  — 100  for  diis  table,  and  M  =  200,000  is  a 

reasonable  figure  from  knowledge  of  merchant  shipping,  so  a  global  lower  bound  is  found  by 

a  =  (3120-100)/(200000-100)  =  .0151 

lower  bound  is  ln(100)  +  a(ln(200000)-ln(100))  =  4.60  +  .0151*7.60 
=  4.60  +  .115  =  4.715 

9.2.  Proof  of  desirability  of  subdivision  for  linear  bounds 

It  can  be  proved  that  linear  bounds  on  the  mean  are  never  worsened  by  using  such  subset  statistics.  This 
can  be  seen  graphically  in  figure  9-1.  We  consider  here  the  case  of  binary  subdivision,  and  further 
subdivisions  can  be  covered  by  extension.  We  also  consider  only  functions  concave  downwards,  but  the  other 
case  can  be  handled  analogously. 

First  consider  the  lower  bound.  If  the  ranges  of  the  subdivisions  are  the  same  as  the  full  set,  dicn  the  two 
lower  bounds  must  lie  along  the  same  line,  and  their  weighted  average  must  lie  along  the  line  too;  hence  the 
lower  bound  of  the  full  set  is  exactly  the  weighted  average  of  die  two  lower  bounds.  If  one  or  both  of  the 
subsets  has  a  narrower  range  of  values  than  the  full  set,  diis  can  only  increase  (improve)  the  lower  bound  since 
a  secant  across  a  subrange  lies  fully  above  a  secant  across  a  range  containing  die  subrange.  Hence  the  lower 
bound  cannot  get  any  worse  in  diis  subdivision  summation  of  linear  lower  bounds. 

The  upper  bound  also  cannot  be  any  worse.  This  time  range  reduction  within  a  subset  does  not  matter 
because  the  upper  bound  is  constrained  to  lie  along  the  curve  of  die  function,  which  is  independent  of  where 
it  is  sliced.  The  weighted  average  of  die  two  subset  upper  bounds  is  a  point  along  die  line  connecting  two 
points  on  the  function  curve.  But  since  the  function  is  concave  downwards,  this  point  is  always  below  the 
function.  But  since  the  upper  bound  on  die  full  set  is  constrained  to  lie  on  the  curve,  die  subdivision  process 
always  guarantees  a  better  upper  bound  as  long  as  the  two  subdivision  means  arc  different,  and  no  worse  if 
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Figure  ^-1:    Impiovomcnts  in  linear  bounds  from  combining  statistics  on  two  disjoint  sets 
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they  arc  not  different. 

10.  Exploiting  order  statistics  as  well 

So  far  we  have  only  assumed  knowledge  of  the  maximum,  minimum,  mean,  and  (sometimes)  standard 
deviation  of  sets  of  data  values.  If  we  have  additional  statistics  on  the  data  values  we  can  do  a  better  job  of 
estimating  statistics  on  the  transformed  values.  In  this  section  we  discuss  using  order  statistics  (e.g.  medians 
and  percentiles).  Order  statistics  have  the  nice  property  that  they  have  one-to-one  mappings  from  the  original 
data  values  to  the  transformed  values  under  the  monotonic  transformations  we  arc  assuming. 

10.1.  Usingthe  median 

First,  assume  we  know  a  median  in  addition  to  the  maximum,  minimum,  and  mean.  We  can  often  get  an 
immediate  improvement  in  the  bounds  on  estimates.  Let  the  error  curve  (linear,  quadratic,  or  whatever)  be 
e(x).  Then  the  median  can  be  thought  to  partition  the  points  into  two  equal-sized  subranges  (assume  the 
number  of  points  to  be  large  enough  so  that  even  numbers  of  points  don't  bother  us).  Then  an  upper  bound 
on  the  mean  of  the  transformed  values  is  the  estimate  given  by  the  approximation  curve  plus  one  half  the 
maximum  of  the  error  curve  in  the  range  to  the  left  of  the  median  plus  one  half  die  maximum  of  the  error 
curve  in  die  range  to  the  right  of  the  median.  The  lower  bound  on  the  mean  is  found  substituting 
"minimum"  for  "maximum"  in  the  above  rule.  Thus  knowing  the  median  decreases  die  influence  of  cxtrcma 
of  the  error  curve. 

10.2.  Other  order  statistics 

We  can  generalize  these  ideas  to  the  situation  where  we  know  arbitrary  order  statistics  on  the  original 

distribution.    Denote  Uiese  statistics  as  r  pairs  of  the  form  <x  ,f>,  where  fraction  f.  of  the  items  in  die 
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distribudon  are  claimed  to  lie  to  the  left  of  value  x .    Then  we  can  generalize  die  formula  of  section  5  as 

follows: 

upper  bound  is  <estimatc  from  approximation  curvc>  -2  [f.  *  min  [c(x)] 

i-l        i 

lower  bound  is  <estimate  from  approximation  curvc>  -  2.  If,  . .  ,  *  max     ,  .  fe(x)ll 

'  r  i l  Ki<r  x.  ,<x<x  l  v   /JI 

where  e(x)  is  the  error  curve  a(x)-f(x),  xQ  is  denned  as  m,  with  ^  =  0,  and  the  x  is  defined  as  M  (with 
corresponding  f  of  1).  Thus  die  effects  of  the  extreme  points  of  c(x)  arc  "diluted"  by  their  fractional 
coefficients,  and  the  more  order  statistics  arc  known,  die  tighter  the  eventual  bounds. 

Under  certain  circumstances  we  can  simplify  the  above  formulae  considerably.  If  we  know  even- 
subdivision  order  statistics  (i.e.,  f  =  i/r,  r  the  number  of  order  statistics),  and  if  the  error  curve  c(x)  is 
monotonic,  then  the  maximum  and  minimum  of  e(x)  in  each  subintcrval  between  the  order  statistic  ordinates 
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x.  must  lie  at  the  endpoints.  So  if  e(x)  is  monotonic  increasing,  the  upper  bound  is  [21<i<mc(xj)]/r  and  the 
lower  bound  is  [2,<.<  e(x.  ,)]/r;  and  vice  versa  ifc(x)  is  monotonic  decreasing.  Hence  the  absolute  range 
between  the  upper  bound  and  lower  bound  is  always  the  same  number,  |e(xm)-e(x0)|/r  =  |c(M)-c(m)|/r. 
(Note  that  Taylor-series  quadratic  approximations  are  monotonic  if  e(m)<0<e(M)  or  e(M)<0<c(m),  conditions 
which  occur  frequently.) 

10.3.  Order  statistics  and  the  standard  deviation 

Order  statistics  are  also  helpful  in  estimating  the  standard  deviation  of  the  transformed  values,  especially 
order  statistics  for  the  leftmost  and  rightmost  subranges  of  the  interval.  Recalling  the  bounds  lines  drawn 
through  the  mean  of  the  transformed  values  in  section  4.2,  we  had  to  draw  them  so  they  lay  entirely  above  the 
curve  to  one  side  of  the  mean,  and  entirely  above  on  the  other  side,  and  this  is  a  highly  conservative 
assumption.  Assume  v  is  known  precisely.  We  could  probably  get  a  better  bound  if  we  knew  how  many 
points  lay  to  the  left  of  some  x,,  and  the  drew  a  secant  of  f(x)  from  the  transform  mean  to  it,  rather  than  from 
the  transform  mean  to  m;  or  if  we  knew  how  many  points  lay  to  die  right  of  some  x  ,,  and  drew  secant  from 
the  transform  mean  to  it  instead  of  M.  See  figure  10-1. 

The  estimate  of  the  standard  deviation  of  the  transformed  values  obtained  from  these  lines  is  just  their 
slope  times  the  original  standard  deviation.  But  to  get  a  bound,  we  need  a  correction  for  the  points  lying  more 
extreme  than  the  new  point  of  intersection.  Consider  the  example  of  curve  concave  downwards  like 
logarithm,  and  take  the  upper  bound  line  from  the  transform  mean  to  some  point  to  the  left;  call  the  point  x,, 
and  let  it  be  an  order  statistic  so  that  fraction  p  of  the  distribution  lies  to  the  left  of  it.  Assume  the  mean  of  the 
transformed  values  is  known  exactly.  Then  the  correction  for  a  bound  corresponds  to  the  situation  where  all 
the  p  points  are  at  m,  which  means  a  difference  in  the  variance  of 

p*[(n>H(m))2  -  [(^-m)*(f(r)-f(x1))/(^-x1)]2] 
where  v  is  the  number  which  maps  functionally  to  the  mean  of  the  transformed  values.  Hence  the  expression 
for  the  upper  bound  on  the  standard  deviation  is 

[[o2Hli-v)2]M'>>f{x)y('>-xi)]2  +  P*(n>H(m))2  -  pl^-m)*^)-^))/^^)]2] 5 

So  using  such  a  bounds  line  can  give  a  better  slope,  but  one  pays  a  penalty  of  a  correction  term  which 
subtracts  from  the  slope  improvement.  An  obvious  question  is  under  what  conditions  use  of  the  order  statistic 
helps.  It  turns  out  this  has  a  surprising  answer  when  v  is  known  exactly.  Denote  the  two  slopes  as  s  and  s  , 
i.e. 

sm  =  (flW-f(m))/?-m),  so  -  (fW-f^Mr-xp 
we  can  rewrite  our  expression  for  the  upper  bound  as 

[[a2+(n-p)2)  *  s2  +  p*s2n*(r-m)2  -  p's^-m)2] 5 
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Figure  /<?•!:    Hxploiting  order  statistics  for  a  better  bounds  on  the  standard 

deviation 
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This  will  represent  an  improvement  on  the  linear  upper  bound  [cr  +{[i-v)  ]s    if 

[a2  +  (/i-r)V  >  [a2  +  (]^)2]  *  s2  +■  p*s2n*(i;-m)2  -  p*s2*(,>-m)2 
or  [a2  +  (M-/')2][s2n  -  s2]  >  p*[s2m  -  sj]*(*-m)2 

So  the  slope  terms  cancel,  and  use  of  the  order  statistic  <x,,p>  is  going  to  be  helpful  when: 

[a2  +  (ju-*>)2]>p*("-m)2 
^orp<[(a2  +  (^)2)/(^-m)2] 

This  result  is  independent  of  where  the  order  statistic  is  within  the  distribution  (x^,  and  depends  only  on  the 

standard  deviation  and  minimum  of  the  original  distribution,  and  the  mean  of  the  transformed  values.  The 

corresponding  result  for  the  rightmost  order  statistic  is 

p  <[(o2  +  bi-i>)2)/(M-vf] 

where  p  is  the  fraction  of  items  to  the  right  of  x  ^ 

If  we  know  other  order  statistics  than  just  the  leftmost  and  rightmost  (x1  and  x  ,)  wc  can  get  better  bounds, 
though  predicting  the  improvement  is  difficult.  For  instance,  if  we  know  x,,  we  can  take  a  line  from  v  to  x-, 
and  estimate  the  contribution  to  the  correction  factor  from  the  items  between  x,  and  x,  differently  than  the 
contribution  of  items  between  m  and  x.. 


10.4.  Adjustment  of  standard  deviation  for  an  inexact  transform  mean 

If  wc  do  not  know  the  exact  mean  of  the  transformed  values,  cp  =  fi»,  we  must  adjust  these  results.  Let 

the  bounds  on  the  transform  mean  be  v{   and  v„  as  in  section  4.3.    Assume  f(x)  has  a  negative  second 

derivative.  The  formula  for  the  upper  bound  is 

[[a2  +  (M^)2][(f(^-f(xi))/(^-xi)]2  +  p*(^)-f(m))2-p*[("-m)*(f(i')-f(x1))/(^x1)]2]-5 

Since  v  </a,  [[a  +(ju.-y)  ]  is  monotonically  decreasing  with  v  in  its  range.  The  rest  of  the  expression  is  the 

difference  of  a  term  and  the  difference  of  two  others.    The  first  term  is  monotonically  decreasing  with 

increasing  v  since  the  second  derivative  of  the  curve  is  negative.   This  represents  the  second  moment  of  f, 

items  grouped  at  m  on  the  curve.  As  v  increases,  the  possible  distance  these  items  could  be  off  the  bound  line 

increases,  and  their  relative  weight  increases  as  f(f )  becomes  relatively  larger  than  f(m).    Hence  since  this 

correction  term  is  subtracted  from  the  slope,  the  effect  as  v  increases  will  be  for  all  the  terms  to  decrease. 

Hence  the  adjusted  value  for  the  upper  bound  on  the  standard  deviation  of  the  transform  values  is  just 

[[a2  +  (M-r,  )2]MV]  )-f(Xj))/(^  -Xi)]2  +  p*(H>!  )-f(m))2 
-p*[(»'L-m)*(f(»'L)-f(x1))/(«/L-x1)]2]-5 

substituting  v u  for  v  in  the  exact-p  formula. 

Similarly,  wc  substitute  v .  for  />  to  get  an  adjusted  lower  bound.    Analogously,  we  handle  curves  with  a 
positive  second  derivative  by  substituting  v.  for  v  for  an  upper  bound,  v .   for  v  for  a  lower  bound. 
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10.5.  Quasi-order  statistics  from  the  standard  deviation 

If  we  know  the  mean  and  standard  deviation  of  a  set  of  data  values,  we  can  use  Chchyshcv's  inequality  to 
bound  the  number  of  items  lying  more  than  a  certain  distance  from  the  mean.  This  information  is  like  an 
order  statistic,  but  since  it  only  represents  an  upper  bound  on  the  number  of  items  in  a  region  and  not  an 
exact  number  of  items,  it  must  be  used  carefully.  It  can  only  be  used  for  partitions  of  the  interval  of  interest 
into  two  parts,  the  subintcrval  of  points  farther  than  a  certain  distance  to  the  left  (or  right)  of  the  mean,  and  a 
subinterval  of  all  other  points  of  the  interval.  It  can  also  only  be  used  for  an  upper  bound  on  the  mean  of  the 
transformed  values,  given  /x,  when  e(x)  has  a  maximum  on  the  first  subinterval  that  is  more  than  the 
maximum  on  the  second,  or  for  a  lower  bound  when  e(x)  has  a  minimum  on  the  first  subinterval  that  is  less 
than  the  minimum  on  the  second. 

Actually,  Chebyshev's  inequality  in  the  standard  form  (that  only  a  fraction  a  /D  of  the  points  of  a 
distribution  can  lie  greater  than  distance  D  units  from  the  mean)  is  not  the  best  inequality  we  can  get,  since  it 
refers  to  both  tails  of  a  distribution,  and  we  are  only  concerned  widi  the  number  of  points  in  one  tail.  Only 

7         9  9 

a  /(a  +  D  )  points  can  lie  to  the  left  of  a  point  D  to  the  left  of  the  mean,  or  lie  to  the  right  of  a  point  D  to  the 
right  of  the  mean.  To  sec  this,  note  that  if  fraction  f  of  the  points  lie  to  the  left  of  a  point  D  units  to  die  left  of 
the  mean,  then  their  weighted  second  moment  about  the  mean  is  at  least  fD  ,  which  must  be  less  than  a  .  But 
in  order  for  the  mean  to  be  at  the  place  it  is,  this  fraction  f  of  the  points  must  be  compensated  for  by  (1-0 
points  R  units  to  the  other  side  of  the  mean-.  For  maximal  f,  these  other  (1-0  points  must  all  be  at  the  same 
location,  for  otherwise  they  would  have  a  nonzero  variance  which  plus  their  mean  would  add  to  the  variance 
of  the  whole  distribution,  and  would  require  a  lower  maximum  f.  Hence  we  have  tv/o  equations  to  solve 
simultaneously: 

fD2  +  (l-f)R2  =  a2 
fD  -  (1-OR  =  0 

which  imply 

R  =  Df/(l-0,  fD2/(l-0  =  a2,  f  =  a2/(a2  +  D2) 

Using  this  result,  we  then  can  put  bounds  on  the  mean  of  the  transformed  values  of 

upper  bound:  f(/x)  +  .5ct  f'(ju) 
+  (a2/(a2  +  D2)*maxm<       .D(e(x)) 
+  (D2/(a2+D2)*max^-  <x<M(e(x)), 

provided  the  first  max  value  is  greater  dian  the  second 

lower  bound:  f(/x)  +  .5<j  f '(ju) 
+  (a2/(a2+DVmi„m<s        (c(x» 

+  (D2/(o2+DTiiii"rn<-<M(c(x)), 
provided  the  first  min  value  is  less  than  the  second 

These  are  the  left-sided  bounds;  wc  can  also  get  analogous  expressions  for  bounds  using  points  on  the  right  of 
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a  distribution.    Unfortunately,  wc  cannot  find  optimal  values  of  D  for  these  formulas  because  they  the 
derivative  cannot  be  applied. 

Note  that  while  it  may  be  difficult  to  determine  for  an  arbitrary  e(x)  whether  the  maximum  in  one  interval 
is  greater  dian  in  another,  the  Taylor-series  quadratic  approximation  often  always  has  this  property  for  either 
the  left-side  or  right-side  rule. 

10.6.  Evaluation  of  quasi-order  statistics  from  the  standard  deviation 

Let  us  return  to  the  analysis  in  section  5.3  of  our  standard  example  with  the  quadratic  Taylor  series 

approximation  at  ju.    Choose  as  subintervals  10<x<33  and  33<x<100,  so  D  =  33-23  =  10=  a.    Since  the 

error  curve  is  monotonically  increasing  (e(m)<c(ju.)<e(M),  and  no  e'(x)  =  0  except  /x)  the  maxima  on  the 

subintervals  are  at  the  rightmost  points,  and  the  minima  at  die  leftmost.     Hence  die  maxima  are 

e(33)  =  3.50-(3.14+.435-.106)=.03  and  e(100)  =  3.7.  Similarly  for  the  other  bound,  choose  D  =  5,  10<x<18, 

and  18<x<100;  and  the  minima  are  e(10)  =  -.12  and  e(18)  =  2.89-(3.14-.217-.023)  =  -.01.    The  maximum 

fraction  fforx  =  33  is  102/(102+102)  =  .5,  and  for  x  =  18  is  102/(102  +  52)  =  .8.  Hence  the  revised  bounds  on 

the  mean  of  the  transformed  values  are 

lower:  3.06  -  .5*.03  -  .5*3.7  =  1.20 
upper:  3.06  -  .8*-.12  -  .2*-.01  =  3.16 

which  are  better  than  the  bounds  obtained  in  section  5.3. 

D  is  a  parameter  here  that  can  vary  arbitrarily.  Let  us  find  the  best  value  for  it,  for  the  case  of  a  Taylor 
scries  approximation  where  e(x)  increases  with  x,  and  a  lower  bound: 

0  =  a/3D[(a2/(a2  +  D2))*c(Ju-D)  +  (D2/(cr2  +  D2))  *  c(M)] 
0=  a/3D[[a2*e(Ju-D)  +  D2  *  c(M)]  /  (cx2  +  D2)] 
soa23/ai)[f(/i-D)-fl[/i)-nr(/i)-.5D2f'(jLi)]  +  [2D*e(M)] 

=  [a2[fi>-D)  -  fX/x)  -  Df  (ji)  -  .5D2f"(/i)]  +  D2e(M)]  *  2D 
Hence  a2[-f(Ja-D)-f(JLt)-DfV)]  +  [2D  *  c(M)l 

=  [a2[f(jLt-D)  -  f(jn)  -  Df  (/i)  -  .5D2f(/i)]  +  D2c(M)]  *  2D 
or  2De(m)(l-D2)  /  a2 

=  f  (/i-D)  +  (l-2D2)f(/x)  +  D(l-D2)f'(iLi))  -  2Df(/iD)  -  2Dft» 

which  wc  can  solve  by  iterative  methods  to  find  the  best  value  of  D. 

10.7.  Splines  and  order  statistics 

We  have  not  referred  to  spline  approximations  in  the  preceding  analysis  because  if  an  approximation  curve 
is  divided  into  pieces  with  different  properties  then  we  must  know  how  many  data  points  arc  in  each  to 
calculate  means  and  standard  deviations  on  die  transformed  values.  One  might  think  that  for  a  given  set  of 
order  statistics  on  a  distribution  wc  may  be  able  to  create  a  spline  approximation  broken  at  the  points  at  which 
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the  order  statistics  arc  sited,  and  use  that  for  bounding.  But  we  still  need  to  know  means  of  every  subinterval, 
the  knowledge  discussed  in  section  9,  which  may  be  difficult  to  obtain.  Thus  splines  may  be  difficult  to  use. 

1 1 .  Using  fits  to  known  distributions 

As  a  final  kind  of  information  which  we  might  have  about  a  set  of  values,  we  might  know  that  their 
distribution  is  close  to  some  well-known  distribution,  with  a  certain  allowed  tolerance.  If  the  tolerance  is 
small  we  can  expect  quite  tight  bounds  on  the  transformed  values.  But  estimating  statistics  this  way  requires 
special  preparation  in  advance  (namely,  measuring  fits  to  a  predicted  distribution),  and  is  not  possible  with 
most  data  presented  in  already-aggregated  units. 

11.1.  General  formula  for  known  distributions 

A  well-known  result  (e.g.  [3],  section  7.3)  gives  the  distribution  of  die  transform  of  some  probability 
distribution  p(x),  under  die  transformation  function  f(x),  as 

q(y)  =  p(f \)'))  *  |dfl(y)/dy| 
as  a  function  of  y,  provided  f  is  cidier  monotonically  increasing  of  decreasing  in  the  interval. 

So  for  instance  if  our  p(x)  approximates  a  uniform  distribution  on  the  interval  m  to  M,  q(y)  =  (l/(M-m))  * 
|df  (y)/dy|.  For  f(x)  =  ln(x),  q(y)  =  eV(M-m)  on  the  interval  y  =  ln(m)  to  y  — ln(M);  an  estimate  of  the  mean 
of  q(y)  is 

/yq(y)dy  /  /q(y)dy  =  [(ln(M)-l)M  -  (ln(m)-l)m]  /  (M-m)  =  -1  +  [M  hi(M)  -  m  ln(m)]/(M-m) 
and  an  estimate  of  die  second  moment  about  zero  is 

/y2q(y)dy  /  /q(y)dy  =  [M[ln(M)*ln(M)  -  2  ln(M)  +  2]  - 
m[ln(m)*ln(m)  -  2  ln(m)  +  2]]  /  (M-m) 

which  minus  the  square  of  the  estimate  of  die  mean  gives  an  estimate  of  the  variance. 

For  p(x)  uniform,  f(x)=l/x,  q(y)  =  1/y  (M-m)  on  the  interval  y=l/M  to  y=l/m;  an  estimate  of  the 
mean  ofq(y)  is 

[ln(l/m)-ln(l/M)]/(M-m)  =  ln(M/m)/(M-m) 
and  an  estimate  of  die  second  moment  about  zero  is  (1/m  -  l/M)/(M-m)  =  1/mM,  hence  an  estimate  of  the 
variance  is 

1/mM  -  [ln(M/m)/(M-m)]2 
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1 1 .2.  Handling  inexact  fits  to  distributions 

We  have  not  addressed  how  to  get  bounds  on  means  and  standard  deviations.  We  can  do  this  by  defining 

an  "upper  fit"  to.,  and  "lower  fit"  co,  on  the  discrete  set  of  n  values  x;  such  that 

w.,  =  max  [x.  -  g.l,  w,   =  min.  [x.  -  g.l 
U  i l  i    &iJ      L  i l  i    °iJ 

where  f8^  p(x)dx  =  (i-.5)/n,  and  p(x)  is  die  distribution  the  x  fit  to 
In  other  words,  the  fits  are  the  maximum  and  minimum  deviations  of  an  x.  from  its  value  predicted  by  the 
approximating  distribution  p(x). 

We  can  exploit  the  assumed  fact  that  f(x)  is  monotonically  increasing  or  decreasing  to  say  that  the 
maximum  and  minimum  of  the  mean  of  the  transformed  values  occur  when  die  x.  are  all  at  co.,  or  all  at  w, 
from  their  predicted  positions,  not  necessarily  respectively.  This  is  because  less  than  an  extreme  deviation  for 
one  point  cannot  improve  prospects  for  a  more  extreme  mean;  all  point  deviations  are  independent  of  one 
another,  widiin  the  tolerances.  Hence  to  find  the  extreme  values  of  the  transformed  mean  one  just  calculates 
the  means  of 

q,j(y)  =  p[f fy)-^]  *  IdfVyVdyl  and 
qL(y)  =  p[f  \y>« J  *  |df  Wdyl 

We  can  use  this  same  approach  to  get  bounds  on  the  standard  deviation  in  the  manner  of  section  4.1.  We 
just  define  a  g(x)  =  [f(x)]  as  a  new  transformation  function,  and  compute  the  above  formulae  widi  g  instead 
of  f.  We  then  compute  bounds  on  the  mean,  square  them,  and  subtract  this  interval  from  the  interval 
computed  on  the  mean  of  g(x). 

1 1 .3.  Example  of  inexact  distribution  fit 

Suppose  we  know  the  distribution  of  x.  fits  an  even  distribution  on  the  interval  10  to  100,  to  such  an  extent 
that  a  point  is  never  further  dian  2  units  in  advance  of  where  it  would  be  in  a  perfectly  even  distribution,  and 
never  more  tiian  3  units  behind.  Then  the  maximum-mean  distribution  is  a  uniform  distribution  from  12  to 
102,  and  the  minimum-mean  distribution  is  a  uniform  distribution  from  7  to  97.  Suppose  we  want  to  find  the 
mean  of  the  logarithms  of  these  data  values.  Using  the  formulae  we  obtained  in  section  11.1,  the  mean  of  die 
first  distribution  is  [102  ln(102)  -  12  ln(12)  -  102  +  12]  /  (102-12)  =  (472  -  29.8)/90  -  1  =  5.02  -  1  =  4.02;  and 
the  mean  of  the  second  distribution  is  [97  ln(97)  -  7  ln(7)  -  97  +  7]  /  (97-7)  =  (443  -  13.6)/90  - 1  =  4.78  - 1  = 
3.78.  Hence  the  mean  of  the  transformed  values  is  between  3.78  and  4.02,  corresponding  to  antilogs  of  44  and 
56.  Note  the  mean  of  the  original  values  must  lie  between  (102 -f  12)/2  =  57  and  (97  +  7)/2  =  52. 

For  an  estimate  of  die  standard  deviation  we  use  the  formula  previously  derived  for  an  estimate  of  die  sum 
of  die  squares,  namely 
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[M[ln(M)*ln(M)  -  2  ln(M)  +  2]  -  m[ln(m)*ln(m)  -  2  ln(m)  +  2]J  /  (M-m) 
=  [M(ln(M)-l)2-m(Ln(m)-l)?]/(M-m)  +  1 

For  the  uniform  distribution  12  to  102,  this  is 

[1 02(3. 62)2  - 12(1. 48)2]/90  +  1  =  (1338-26.2)/90  +  1  =  15.61 
and  for  the  uniform  distribution  7  to  97  this  is 

[97(3.57)2  -  7(.945)2]/90  +  1  =  (1235-6.25)/90  +  1  =  14.58 
From  the  previous  paragraph  we  know  bounds  on  the  mean  of  the  transformed  values  are  3.78  and  4.02, 
hence  bounds  on  the  square  of  die  mean  are  14.3  and  16.2.  Hence  bounds  on  the  variance  are  15.61-14.3  =  1.3 
and  max(14.58-16.2,0)  =  0.  Hence  bounds  on  the  standard  deviation  of  the  transformed  values  are  1.14  and 
0. 


12.  Small  populations 

Thusfar  we  have  not  made  use  of  the  size  of  the  data  population  being  analyzed.  This  is  only  significant  if 
the  population  is  particularly  small,  in  which  case  the  known  maximum  M  and  minimum  m  (and  die  median 
and  mode  too,  if  known)  are  a  nonnegligible  proportion  of  the  points  of  the  distribution.  For  instance,  the 
linear  bounds  represent  in  general  die  two  extreme  cases  where  (a)  all  die  points  are  grouped  at  die  mean,  and 
(b)  all  die  points  arc  at  die  maximum  and  die  minimum.  Knowledge  of  M  and  m  thus  decreases  the  distance 
between  linear  bounds  by  a  factor  of  2/n,  n  the  size  of  the  data  population,  since  it  represents  a  weighted 
modification  of  case  (a)  by  two  points  from  case  (b). 

13.  Some  experimental  comparisons  of  the  various  bounds  formulae 

We  have  run  some  simple  experiments  of  the  effectiveness  of  our  bounds  formulae  on  the  mean  of  the 
transformed  values.  We  wrote  programs  in  INTERLISP-VAX.  We  used  two  test  functions,  f(x)  =  ln(x)  and 
f(x)  =  1/x.  For  the  experiments  we  computed  upper  and  lower  bounds  derived  die  following  ways: 

•  simple  linear  bounds  (section  3) 

•  Taylor-series  quadratic  bounds,  scries  around  the  mean  (section  5) 

•  Lagrange-Chebyshev  interpolation  quadratic  bounds  (section  6) 

•  For  die  reciprocal  only,  the  one-sided  quadratic  bounds  (section  7) 

e  Order-statistic  bounds  from  the  Chebyshcv-incquality,  using  a  Taylor  series  around  the  mean 
(section  10.5) 

•  Best  quadratic  bounds  found  by  explicit  optimization  on  quadratic  coefficients  a  and  b  (section  8): 

upper  bound:  a(<r2  +  ju2)  +  tyi  +  c  +  maxm<x^M[f(x)-ax2-bx-c] 
lower  bound:  a(ff2  +  ju2)  +  bju  +  c  +  min    7  7, [f(x)-ax2-bx-c] 
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Wc  discovered  that  our  results  for  optimal  bounds  for  the  reciprocal  curve  were  identical  (except 
for  roundoff  error)  to  those  for  one-sided  bounds,  so  we  have  omitted  the  former  from  the 
reciprocal  table.  Unfortunately,  wc  have  been  unable  to  prove  the  connection  (that  is,  that  the 
one-sided  bounds  are  indeed  the  optimal  ones),  though  wc  strongly  suspect  it. 

Results  are  contained  in  figures  13-1  and  13-2.  Since  the  closed-form  expressions  are  simple  computations,  in 
a  computer  implementation  it  is  advisable  to  try  all  the  different  bounds  methods,  and  take  the  minimum  of 
the  upper  bounds  to  get  a  cumulative  upper  bound,  and  the  maximum  of  the  lower  bounds  to  get  a 
cumulative  lower  bound. 

14.  Application  to  correlated  data 

An  application  of  these  ideas  is  to  estimation  of  statistics  of  one  attribute  from  those  of  another  if  the 
attributes  are  known  to  have  a  nonlinear  correlation  describable  by  a  monotonic  function  such  as  we  have 
been  analyzing.  We  can  then  bound  statistics  on  one  attribute  from  statistics  on  the  other. 

15.  Direct  optimization 

We  should  note  there  is  another  kind  of  optimization  that  can  be  applied  to  problems  of  this  sort.  We  can 
make  the  optimization  variables  the  values  themselves  of  an  unknown  distribution  and  perform  a  constrained 
optimization  with  objective  function  the  statistic  on  which  bounds  are  desired,  and  with  constraints  the  values 
of  known  other  statistics.  Conceptually,  this  is  a  nice  approach  since  it  can  be  applied  to  arbitrary  states  of 
prior  knowledge  and  can  bound  arbitrary  statistics. 

We  have  done  a  number  of  experiments  which  we  do  not  have  the  space  here  to  discuss,  and  the  idea  seems 
to  work.  However,  we  have  found  that  this  "direct  optimization"  is  highly  sensitive  to  optimization  methods, 
starting  points,  and  step  sizes,  and  is  surprisingly  difficult  to  get  convergence  for;  unlike  quadratic 
optimization,  the  function  optimized  is  not  usually  convex.  But  there  is  an  even  more  serious  problem  with 
direct  optimization,  a  very  fundamental  one:  it  only  gives  lower  bounds  on  upper  bounds,  and  upper  bounds 
on  lower  bounds,  unlike  all  the  other  bounds  discused  in  this  paper  which  are  upper  bounds  on  upper 
bounds,  and  lower  bounds  on  lower  bounds.  For  instance,  for  our  standard  example  wc  found  a  lower  bound 
on  the  upper  bound  of  3.09771  on  the  mean  of  the  logarithms  from  direct  optimization,  but  wc  have  ho  idea 
how  much  larger  a  bound  is  possible  up  to  the  quadratic-optimization  bound  of  3.10383  which  represents  an 
absolute  limit.  Thus  the  utility  of  direct  optimization  is  questionable  in  bopunded  statistical  estimation,,  and 
we  do  not  sec  it  as  a  challenge  to  the  methods  developed  in  this  paper.  (It  does  provide  a  useful  tool  for 
debugging  the  methods,  however,  since  for  instance  any  supposed  bound  we  find  less  than  the  upper  bound 
on  the  lower  bound  is  in  error.) 
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Figure  13-1:    Some  comparisons  between  different  expressions 
for  bounds  on  die  mean,  for  f(x)-ln(x) 
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For  these  results  the  quadratic  optimum  v,7as  verified  to  be  equal  to  the  one- 
sided bound  when  allowing  for  roimdofl'  error. 


Figure  13-2:    Some  comparisons  between  different  expressions 
for  bounds  on  the  mean,  for  f( x )  —  1/x 
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16.  Conclusion 

We  have  developed  some  quick  closed-form  expressions  for  bounds  on  the  mean  and  standard  deviation  of 
a  finite  set  of  transformed  numerical  data  values,  where  the  transformation  function  has  derivatives  of 
constant  sign  in  the  interval  of  interest.  In  making  these  estimates  we  use  only  statistics  on  the  original  set  of 
data  values,  and  no  actual  values  themselves.  Our  bounds  provide  a  useful  alternative  to  often  difficult-to- 
obtain  confidence  intervals,  requiring  no  distributional  assumptions  whatsoever.  Such  bounds  are  likely  to  be 
helpful  for  exploratory  data  analysis  as  an  aid  to  getting  a  feel  for  the  data,  preliminary  to  detailed  hypothesis 
testing. 
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